AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Spatiotemporal Modeling

# Spatiotemporal Modeling

Vivit B 16x2 Kinetics400
MIT
ViViT is an extension of the Vision Transformer (ViT) for video processing, particularly suitable for video classification tasks.
Video Processing Transformers
V
google
56.94k
32
Vivit B 16x2
MIT
ViViT is an extension of the Vision Transformer (ViT) for video processing, primarily used for downstream tasks such as video classification.
Video Processing Transformers
V
google
989
11
Videomae Large
VideoMAE is a video self-supervised pre-training model based on Masked Autoencoder (MAE), which learns video representations by predicting pixel values of masked video patches
Video Processing Transformers
V
MCG-NJU
3,243
31
Video Classification Cnn Rnn
A hybrid CNN-RNN architecture-based video classification model for action recognition tasks
Video Processing
V
keras-io
57
14
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase